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Background to the Invention 

1 . Field of the invention 

The present invention relates to a simplified instruction set microprocessor (RISC). In particular, a 
microprocessor tailored toward and capable of implementing bytecoded virtual machines at speeds 
comparable to existing language-specific microprocessors, but an order of magnitude smaller in terms of 
the number of transistors necessary to implement the microprocessor compared to existing designs, such as 
Java chips. In a preferred embodiment, the microprocessor will be used to implement a version of the Java 
Virtual Machine. The small size and resulting low-power consumption make the microprocessor especially 
attractive fui use in embedded appliculiuua. 

2. Description of prior art 

High level language oriented computer architectures have been known for many years, but until now few of 
them have been successful commercially. Currently, the most popular computer architectures are either 
general-purpose Complex Instruction Set Computers (CISC) such as the Intel 80x86 family of 
microprocessors, or general-purpose Reduced Instruction Set Computers (RISC). 

The success of the Java programming language, and the demands, which the Java environment places on an 
execution platform, can change this situation. This means that language specific architectures may become 
popular, since a Java-oriented architecture is bound to be more efficient in executing Java than a general- 
purpose microprocessor. 

Java is a young technology and has been designed to run on a variety of computing platforms, but the 
benefits of using dedicated special-purpose architectures have been apparent from the start. SUN 
Microsystems, the originator of Java, have developed a microprocessor core called picoJava. This core is 
targeted toward the high-end of the computer market, i.e. it is envisaged to be used in large desktop 
computer type applications. It is large, complex and power-hungry, making it unsuitable for smaller, 
embedded type of applications. Certain similarities between the Java Virtual Machine and the Forth 
programming language have led manufacturers of Forth chips to re-brand them as Java chips. While some 
Forth chips are better at implementing Java than other general-purpose microprocessors, differences 
between the Java Virtual Machine and Forth ensure that a dedicated Java microprocessor will outperform a 
Forth microprocessor. 

Embedded systems often operate under stringent real-time constraints. In particular, the interrupt latency of 
a processor is a critical parameter. With complex microcoded instructions, interrupt latency can be on the 
order of several dozen or even hundreds of clock cycles. 



Summary of the Invention 

It is an object of this invention to provide a high-performance, virtual machine tailored microprocessor with 
a reduced transistor count compared to other processors, such as existing Java processors. 

It is also an object of this invention to provide a high-performance microprocessor, which has low interrupt 
latency necessary in many real-time embedded systems. 

It is a further object of the invention to provide extremely good code density for high-level language 
generated programs. 

It is a further object of the invention to simplify the translation process from a machine independent 
bytecode representation of a program, such as the Java Virtual Machine bytecodes, to the microprocessor's 
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instruction set, so that "on the fly" application program loaders, such as the Java class loaders can be 
implemented at minimal cost. 

It is a further object of the invention to provide a means for the microprocessor to communicate with other 
microprocessors via a local area network, and to enable software upgrades to be performed remotely over 
the network. 

It is a further object of the invention to provide the means for the microprocessor to interface directly to a 
dedicated slave co-processor (such as a numeric co-processor, a digital signal processor (DSP), a special 
purpose communication processor, etc.) so that special purpose systems can be easily implemented. 

The above and related objectives may be achieved by the use of the novel, low cost microprocessor herein 
disclosed. In accordance with one aspect of the invention, a microprocessor system in accordance with this 
invention has a central processing unit, an 8-bit wide bytecode memory, a 32-bit data memory and busses 
connecting the central processing unit with the two memories. 

In accordance with another aspect of the invention, the microprocessor system contains an arithmetic logic 
unit and a data stack. The top two elements of the data stack are connected to the inputs of the arithmetic 
logic unit and the output of the arithmetic logic unit is connected to an internal data bus. 

In accordance with yet another aspect of the invention, the top three elements of the data stack contain 
special-purpose circuits, which enable the efficient (single cycle) execution of seven primitive stack 
operations directly in hardware. The remaining data stack elements are simple shift registers. 

In accordance with another aspect of the invention, the microprocessor has a means for distinguishing a 
"fast call" instruction from a regular instruction by utilising circuitry, which decodes a particular bit of the 
8-bit bytecode. In a preferred embodiment of the microprocessor, this would be the most significant bit of 
the 8-bit bytecode. 

In accordance with another aspect of the invention, the microprocessor has a means of recognising a "fast 
return" instruction "folded" with a regular instruction, by utilising circuitry, which decodes the "fast call" 
bit in the 8-bit bytecode and another dedicated bit in the 8-bit bytecode. In a preferred embodiment of the 
microprocessor, this would be the second most significant bit in the 8-bit bytecode. 

In accordance with another aspect of the invention, the microprocessor contains a dedicated register called 
the Parameter Pool Pointer, together with associated circuitry and several dedicated instructions for storing 
and accessing 32-bit quantities in data memory using short (8-bit) offsets. This mechanism allows the 
efficient implementation of local (dynamic) variables in block and object oriented languages. 

In accordance with another aspect of the invention, the microprocessor contains a dedicated register called 
the Global Constant Pool Pointer, together with associated circuitry and dedicated instructions for storing 
and accessing 32-bit quantities in data memory using short (8-bit) offsets — This mechanism allows the, 
efficient implementation of 32-bit literal constants, which are global to all execution contexts, using only an 
8-bit bytecode extension. 

In accordance with another aspect of the invention, the microprocessor contains a dedicated register called 
the Local Constant Pool Pointer, together with associated circuitry and dedicated instructions for storing 
and accessing 32-bit quantities in data memory using short (8-bit) offsets. This allows the efficient 
implementation of 32-bit literal constants, which are local to a particular execution context. 

In accordance with another aspect of the invention, the microprocessor contains a dedicated register called 
the Extension Stack Pointer, together with dedicated circuits, which is used to spill data stack elements into 
data memory, and to refill the data stack from data memory. 

In accordance with another aspect of the invention, the microprocessor contains dedicated circuitry, to 
efficiently implement a subroutine call via an in-memory jump table, which is essential for an efficient 
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implementation of dynamic method dispatch in object oriented programming languages. In a preferred 
embodiment of this invention this language would be the Java programming language. 

In accordance with another aspect of the invention, the microprocessor contains dedicated circuitry and 
instruction to improve the efficiency of exception handlers. 

In accordance with another aspect of the invention, the microprocessor contains dedicated hardware 
mechanisms to allow the efficient implementation of a variety of garbage-collection algorithms, which are 
essential in many object-oriented systems, such as Java. 

In accordance with yet another aspect of the invention, the microprocessor contains circuits for interfacing 
the microprocessor to a data network and to allow the microprocessor's software to be dynamically 
upgraded over that network. 

In accordance with another aspect of the invention, the microprocessor contains a means for 
communicating with a special-purpose co-processor, such as a DSP processor or a math processor. The co- 
processor can be integrated on the same silicon die as the microprocessor, or can be located off-chip. 

The attainment of the foregoing and related objects, advantages and features of the invention should be 
more apparent to those skilled in the art, after review of the following more detailed description of the 
invention. 

Detailed Description of the Invention 

The microprocessor of this invention has the following important distinguishing features: 

Low transistor count and low power dissipation for use in small embedded systems 
Unique two-level architecture involving a fast micro machine and a slower macro machine 
Modified Harvard architecture with an 8-bit wide bytecode memory and a 32-bit wide data memory. 
Bytecode (0-operand) instruction set for maximum code density. 
32-bit internal architecture. 

A 32-bit wide hardware operand stack coupled to the ALU, with automatic fill/spill unit. 
A 32-bit wide hardware return (subroutine) stack. 

RISC-like instruction set, with most instructions executing in a single cycle. 
No pipelining. 

Zero-overhead subroutine return instruction which can be "folded" with other instructions. 
128 user-programmable bytecodes facilitating the efficient creation of virtual machines. 
Hardware support for the efficient execution of high-level languages, such as Java. 
Global and local constant pools, allowing the specification of 32-bit immediate constants using 8-bit 
offsets in the bytecode stream. 



Parameter pool, allowing the efficient implementation of local variables. 
Hardware and instructions for dynamic method dispatch 

Hardware and instructions for the efficient implementation of software exceptions 
Hardware support for garbage collection algorithms 

Interface to a local area network for dynamic upgrading of the application software 
Hardware interface to a variety of on- and off-chip co-processors 

Figure 1 shows the block diagram of the microprocessor (10). It is seen that the microprocessor has a 
simple architecture. The central processing unit is connected to two memories: an 8-bit wide Bytecode 
Memory (100) and a 32-bit wide Data Memory (114). The CPU contains a main Arithmetic Logic Unit 
(103) and two hardware stacks: the Return Stack (104) and the Data Stack (108). The top two elements of 
the Data Stack (108) are connected directly to the inputs of the ALU (103). The CPU also contains an 8-bit 
Instruction Register (101) for latching the currently executing instruction bytecode, and an Immediate 
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Operand circuit (102) which connects the output of the Bytecode Memory to the CPU's Internal Data Bus 
(115). The CPU also contains a Program Counter register (105), which is connected to the input of the 
Bytecode Memory Address ALU (106) and also to the input and output ports of the Return Stack (104). In 
addition to the above, the CPU also contains four dedicated address registers, namely the Global Constant 
Pool Pointer (109), the Local Constant Pool Pointer (110), the Extension Stack Pointer (111) and the 
Parameter Pool Pointer (112). The read ports of the address registers are connected to the input of the Data 
Memory Address ALU (113). 

Figure 2 shows the block diagram of the fast subroutine call mechanism. The microprocessor instruction 
decode logic includes a circuit which distinguishes a bit (150) (the most significant bit in a preferred 
embodiment of the invention) in the 8-bit instruction register (101) which latches a bytecode fetched from 
bytecode memory (100). If the bit (150) is active, the microprocessor program counter is loaded with the 
nntpnt nf the Address Generator circuit (151} which uses the remaining 7 bits of the bytecode to produce an 
address. In a preferred embodiment of the invention, the circuit (151) shifts the low 7 bits left by two bits. 
The program counter register load is determined by the control signal on gate (152). At the same time, the 
current value of the program counter (105) is stored in the on-chip return stack (104). This sequence of 
actions is performed in a single clock cycle, and is equivalent to a fast subroutine call to one of 128 
possible locations. 

Figure 3 shows the block diagram of the circuit implementing the folding of a subroutine return operation 
with a regular instruction. The microprocessor instruction decode logic includes a circuit which tests two 
bits (150 and 200) in the 8-bit instruction register (101) which latches a bytecode fetched from bytecode 
memory, and controls a circuit (201), which reloads the program counter register from the top of the return 
stack. In a preferred embodiment of the invention the two bits would be the two most significant bits in the 
8-bit bytecode This action is performed in parallel with normal instruction execution, and means that any 
(regular) instruction of the microprocessor can be "folded" with a return from subroutine operation, 
effectively providing a zero-overhead operation. 

The fast call/fast return features of the microprocessor allow very short instruction sequences to be encoded 
as 8-bit user-defined bytecodes. This feature is a key to providing good code density and also good 
interrupt latency, since the "macro" instructions are composed of a sequence of simple (RISC-like) 
machine instructions of the microprocessor, and can be interrupted without problems. 

The instruction decode logic of the microprocessor includes a circuit, shown in Figure 4, which allows a 
bytecode to be followed by a single 8-bit immediate value. This immediate value is read from bytecode 
memory (100) and latched in the immediate operand module (102). Depending on the bytecode, this 8-bit 
value can be combined with one of the address registers (109, 110, 111, 112), shown in Figure 4 as (250), 
to provide an address into data memory, from which a full 32-bit immediate value is fetched. 

The organisation of the Data Stack is shown in Figure 5. The top three data stack entries (300) are different 
frum the remaining stack en tr ies (301) and a r c pr ovided with s p ecial -p urpose circuits, which enable them to 
execute seven primitive stack manipulation operations directly in hardware in one clock cycle. The top two 
elements of the Data Stack are connected to the inputs of the microprocessor's Arithmetic Logic Unit 
(103). The stack manipulation primitives have been selected to either directly correspond to some Java 
Virtual Machine stack manipulation instructions, and to allow the composition of the remaining Java 
Virtual Machine instructions from two or three primitives. The primitive operations are: 

POP Remove the top stack element 

DUP Duplicate the top stack element 

OVER Copy the second stack element over the top stack element 

SWAP Swap the top two stack elements 

LROT Rotate the top 3 stack elements left 

RROT Rotate the top 3 stack elements right 

TOVER Copy the third stack element over the top stack element 
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The remaining Data Stack elements (301) are implemented as a simple array of 32xn shift registers. 

Object-oriented languages, such as Java, require the efficient implementation of local (or dynamic) 
variables. The microprocessor here described supports the concept of a Parameter Pool, A parameter pool 
is an area of data (32-bit) memory, with individually addressable locations. The addresses of the individual 
locations are small positive integers. The Parameter Pool is implemented using a dedicated pointer register 
called the Parameter Pool Pointer register. Hardware mechanisms are provided for transferring data from 
the data stack to the Parameter Pool, and for accessing and storing individual elements in the pool using 
either a short (8-bit) offset in the bytecode stream, or a 32-bit offset from the top of the data stack. 

A 32-bit architecture requires 32-bit literal constant operands to be included in the instruction set. Current 
practice in bytecoded architectures is to embed the literal constant in the bytecode stream. This increases. 
the code size, and makes the instruction decoding circuits more complex. The approach taken with the 
present microprocessor is the provision of constant pools, which hold the values of the literal constants. 
Two pools are provided for each execution context. A Global Constant Pool contains literal constants 
common to all execution contexts (the most commonly occurring constants, plus constants for the operation 
of the virtual machine). A Local Constant Pool contains literal constants specific to a particular execution 
context. The constant pools are implemented using two dedicated pointer registers: the Global Constant 
Pool Pointer register and the Local Constant Pool Pointer register. Hardware mechanisms are provided to 
enable the initialisation of the constant pools and the efficient retrieval of any constant from the pools using 
an 8-bit offset in the bytecode stream, or a 32-bit offset from the top of data stack. 

The on-chip data stack is used for expression evaluation. In a preferred embodiment of the invention the 
depth is equal to 8, which is adequate for most situations. The (relatively) shallow depth of the data stack 
makes context switching faster, since in the case in question only at most 8 elements need to be spilled into 
memory. In some cases, the number of elements on the data stack may exceed 8, causing stack overflow. 
To deal with this situation a dedicated register, called the Extension Stack Pointer is provided, together 
with dedicated circuits for loading/storing the bottom element of the data stack in data memory, according 
to the address stored in the Extension Stack Pointer. 

Object oriented languages use dynamic method dispatch very extensively. The present microprocessor 
implements dynamic method dispatch using subroutine calls via a jump table. This operation is performed 
using a built-in instruction of the microprocessor. 

Many modern programming languages, such as Java, include facilities for defining exception handlers. The 
present microprocessor has instructions and dedicated circuits to improve the efficiency of the 
implementation of exception handlers. 

Modern high-level programming languages, such as Java, use sophisticated memory management 
techniques based on garbage collection. The present microprocessor contains circuits, which allow the 
efficient im p lementation of a variety of garbage collection algorithms, for different application areas. 

Since the "semantic gap" between the microprocessor here described and typical stack-based virtual 
machine architectures is small, it is possible to implement very simple dynamic loaders, which allow 
machine-independent application code (such as software upgrades) for the microprocessor to be 
downloaded over a local data network. In a preferred embodiment of the invention, the machine- 
independent code would be the Java class file format. For this reason, the microprocessor contains a unit 
for connecting the processor to a local area network. 

Many modern embedded systems consist of a control module and a data-processing module. The control 
module is usually implemented by a general-purpose microprocessor, while the data processing part is 
implemented by a dedicated hardware processor. Depending on the application area, the dedicated 
hardware processor can be a digital signal processing (DSP) processor, or a special-purpose math co- 
processor. To allow the present microprocessor to be used in such situations, special purpose instructions 
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have been provided in the microprocessor, together with dedicated circuitry enabling the processor to 
directly interface to a wide variety of special-purpose co-processors. 
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CLAIMS 

What is claimed is: 

1 . A microprocessor system, consisting of a central processing unit, 8-bit wide instruction memory, 32-bit 
wide data memory and a hardware stack connected via an internal bus to the program counter register, 
together with an instruction decode unit which includes a circuit for detecting the presence of a 
distinguished bit in the 8-bit bytecode, together with a circuit for loading the remaining bits of the 
bytecode shifted left by a number of bits into the microprocessor's program counter register, while at 
the same time storing the current value of the program counter register on the aforementioned stack. 

2. The microprocessor system of claim 1, additionally comprising a circuit in the instruction decode logic 
section which tests two distinguished bits, one of which is the bit described in claim 1, in an 8-bit 
bytecode, controlling u uiicuit which luuib tliu progruni cumitei mgiiitui with u value from the uluek 
mentioned in claim 1 . 

3. The microprocessor system of claim 2, additionally comprising two address registers, additional 
circuitry and instructions allowing the fetch of 32-bit immediate constants using an 8-bit offset 
embedded in the bytecode instruction stream. 

4. The microprocessor system of claim 3, additionally comprising an address register, additional circuitry 
and instructions allowing the fetch and store of 32-bit values using an 8-bit offset embedded in the 
bytecode instruction stream. 

5. The microprocessor system of claim 4, additionally comprising an instruction for the efficient 
execution of a subroutine calls via an in-memory jump table. 

6. The microprocessor system of claim 5, additionally comprising an instruction and dedicated hardware 
circuits for the efficient execution of structured exception handlers of modern high-level languages. 

7. The microprocessor system of claim 6, additionally comprising instructions and dedicated hardware 
for the efficient implementation of a variety of garbage-collection algorithms. 

8. The microprocessor system of claim 7, additionally comprising instructions a hardware interface to a 
local area network. 

9. The microprocessor system of claim 8, additionally comprising circuits and instructions for interfacing 
with on- or off-chip co-processors such as numeric or digital signal processing co-processors. 

10. The microprocessor system of claim 3, additionally comprising of a 32-bit wide data stack, whose top 
three elements are provided with a hardware means for executing seven distinct operations on the said 
three elements and whose remaining elements are implemented as an array of simple shift registers. 
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Abstract 

High Performance Low Cost Microprocessor 



A microprocessor (10) incorporating a modified Harvard architecture connected to two memory banks 
(100) and (114). The main CPU also includes two pushdown stacks (104) and (108). One of the stacks has 
its top two items connected directly to an arithmetic-logic unit (103). In addition hardware is provided to 
perform seven operations on the top three stack elements. The instruction length of the microprocessor is 8 
bits and most instructions execute in a single clock cycle. A unique feature of the instruction set is that 128 
of the 256 possible bytecodes are user-programmable. Each regular instruction can be "folded" with the 
"return from subroutine" instruction, inis allows the efficient implementation of a wide variety of virtual 
machines, such as the Java Virtual Machine. The microprocessor also contains dedicated registers and 
circuits for the efficient implementation of dynamic variables (112), 32-bit immediate constants (109) and 
(110), interfaces to dedicated co-processors and interfaces to local area networks allowing dynamic 
upgrades of application software. 
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